Create powerful dataviz with R

Ilya Kashnitsky

14 June 2021

before we start

https://twitter.com/RuJEconomics/status/1191697644499984384

https://www.visualcapitalist.com/wp-content/uploads/2018/03/money-happiness-large.html

materials for the course

https://github.com/ikashnitsky/dataviz-mpidr

rule 0 – DO VISUALIZE YOUR DATA

Anscombe’s Quartet

http://i.imgur.com/QA3Ss8D.png

https://i.imgur.com/Ym0oKHj.png

https://i.imgur.com/T7xz3UN.png

https://i.imgur.com/MxUNZxV.png

rule 1 – text should be horizontal

https://ikashnitsky.github.io/2019/dotplot/

example of a figure improvement

https://doi.org/10/ggbtpx

https://gist.github.com/ikashnitsky/2800295e304b4858be553432de4a0d11

https://twitter.com/ikashnitsky/status/1192901559568523269

rule 2 – on slides, text should be as large as possible

https://i.imgur.com/u1nCiJt.jpg

https://i.imgur.com/bYePASN.jpg

https://i.imgur.com/nvNEc2F.jpg

https://i.imgur.com/KpBVKP1.png

rule 3 – mind colors, especially regarding colorblind friendliness

rule 4 – highlight what’s important for the story

https://barcanumbers.wordpress.com/2018/12/06

https://www.ft.com/content/a26fbf7e-48f8-11ea-aeb3-955839e06441

rule 5 – plots don’t have to be overly complicated to be powerful

https://fivethirtyeight.com/features/why-the-oldest-person-in-the-world-keeps-dying

a great example

(not a rule) suggestion

When possible and meaningful for you story – animate

https://www.ft.com/video/83703ffe-cd5c-4591-9b4f-a3c087aa6d19

Or make it completely interactive

https://jschoeley.shinyapps.io/hmdexp

Dataviz principles

https://policyviz.com/2018/08/07/dataviz-cheatsheet/

https://ft-interactive.github.io/visual-vocabulary/

https://serialmentor.com/dataviz/

http://socviz.co/index.html

Jonas Shoeley’s slides

https://github.com/jschoeley/idem_viz

Tidyverse

The most influential R developer

Hadley Wickham

hadley

tidyverse

https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/

https://www.tidyverse.org/

tidy data

Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10). Retrieved from http://www.jstatsoft.org/v59/i10

Tidy data is a standard way of mapping the meaning of a dataset to its structure.

A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types.

In tidy data:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

On pipes

Examples and exercises

Please follow me on the R script "tidy.R"

Visualizing data with ggplot2

A bit more motivation

https://graphics.reuters.com/SOCCER-EURO/yzdvxmjjnpx/

https://www.granvillematheson.com/post/self-portrait

More

Plotting systems in R?

  • base
  • lattice
  • ggplot2

“The winner takes it all”

cat

Strength of base plotting system

  • Usually, base knows how to plot an object
  • Extremely easy to use if you are happy with the default settings
  • BUT
  • Now ggplot2 has the autoplot function

What makes ggplot2 special?

“gg” means “Grammar of graphics”

http://www.springer.com/us/book/9780387245447

Extremely big and helpful community

  • Help
  • Examples
  • Rapid development
  • Extensions

ggplot2 resources

https://ggplot2.tidyverse.org/reference/

https://exts.ggplot2.tidyverse.org

https://pkg.garrickadenbuie.com/gentle-ggplot2/#1

https://emitanaka.org/workshopUTokyo2018/day1-session02-datavis.html#1

https://www.r-graph-gallery.com/cartogram.html

One last hint before we start coding

https://github.com/dreamRs/esquisse

ggplot2 show

Please follow me on “ggplot2.R